Processing event instance data in a client-server architecture

ABSTRACT

A process analysis system ( 1 - 300 ) processes event data describing real-world processes ( 1 - 100 ). The process analysis system performs the following acts: importing event instance data sets from an information management system ( 1 - 200 ), each set comprising one or more attributes describing an event instance in the real-world process ( 1 - 100 ); for each event instance, determining a corresponding process instance based on at least the attributes; determining event order attribute(s) for each imported event instance data set based on other event instance data sets corresponding to the same process instance; forming an analysis result set based on at least the event instance data sets and at least one first or second attribute; the client(s) presenting an analysis utilizing the analysis result set.

PARENT CASE INFORMATION

The present invention claims priority from commonly owned US provisionalpatent application No. 61/598,935 filed 15 Feb. 2012 and titledsimilarly to the present application. The present invention furtherclaims priority and from five commonly owned Finnish patent applications20125169, 20125176, 20125170, 20125173, 20125174, all of which have beenfiled on 15 Feb. 2012 and which have the same title as the presentinvention.

FIELD OF THE INVENTION

The present invention broadly relates to analysing of processes in whicha large amount of events take place for a large amount of processinstances. Such processes can be for example logistic processes wheregoods and/or information is transported between locations and thetransportation of goods and/or information is being tracked forbusiness-related purposes. A logic process comprises several steps, suchas negotiations between customer and service provider (wherein “service”may comprise delivery of physical objects), bids, contracts,manufacturing (eg software and/or hardware), testing, packing, delivery,or the like. In a logistic process a computer system controls andmonitors the logistic process in which hardware, software and/or serviceis negotiated between supplier and customer and delivered from supplierto customer. Another example of such processes are healthcare processes,where a large amount of events take place regarding the treatmentprocess, for example: first aid, doctor visit, surgery and ward visit.Yet another example is a management process for a patent applications,in which several events are recorded for each application, such as“application written”, “application filed”, “USPTO request”, “paymentfor patent”, “patent granted”. A still further example is a managementprocess for sales processes, where events related to sales process arerecorded into a CRM system. Yet another example is the help desk/servicedesk process, where events are recorded into Case management/servicedesk system. Yet another example is a human resource process, where foreach employee of a company events like “recruited”, “salary increased”,“absence”, “holiday”, “promoted to supervisor”, “course X taken”,“employment terminates” are recorded into a HR system.

More particularly, the invention relates to computer-implemented methodsand equipments for automated modelling of processes, in which a processanalysis computer cannot be directly coupled to the underlying processthat actually results in the delivery of hardware, software and/orservice is delivered from supplier to customer.

BACKGROUND OF THE INVENTION

FIG. 1 shows an overall view of an environment wherein the invention canbe utilized for a logistic process. The environment shown in FIG. 1 isintended to illustrate rather than restrict the invention. Referencenumeral 1-100 denotes an exemplary logistic process in the real world.The logistic process 1-100 comprises various process steps, includingreal-world delivery of hardware, software and/or service from supplierto customer. In the context of the present invention, any businessprocesses relating to the logistic process, such as negotiations,bidding or invoicing, are considered part of the logistic process 1-100.Reference numeral 1-200 denotes an information management system, whichsupports the logistic process 1-100. In an illustrative butnon-restrictive example, the information management system, of whichthere may be more than one, comprises a resource-planning system. Theacronym ERP stands for Enterprise Resource Planning, which term isfrequently used in connection with logistic processes. The invention isnot restricted to environments wherein the information managementsystems 1-200 meet a strict definition of ERP system, however. Referencenumeral 1-300 generally denotes computer-implemented analysis tools oranalysis system. It is generally known that computer-implementedanalysis systems, such as the one denoted by reference numeral 1-300,can be used to analyse real-world processes, such the one denoted byreference numeral 1-100.

One of the problems associated with the environment relates to the factthat while it is generally desirable to analyse the efficiency of thelogistic process 1-100 by computer-implemented analysis system 1-300,and thereby locate bottlenecks and problems spots in the logisticprocess 1-100, it is normally impossible to couple thecomputer-implemented analysis system 1-300, directly to the logisticprocess 1-100. In FIG. 1, this problem is illustrated by the fact thatthe arrow between the logistic process 1-100 and thecomputer-implemented analysis system 1-300 is broken. Obviously, thereis a motivation to couple the computer-implemented analysis tools 1-300with the logistic process 1-100 indirectly, via the ERP system 1-200.This indirect coupling, in turn, generates additional problems orquestions, such as how to program the analysis system 1-300 to obtaindata that is relevant to the problem of discovering bottlenecks andproblem spots in the logistic process. For the purposes of the ERPsystem 1-200 it suffices that the various events in the logistic process1-100 are recorded in the ERP system 1-200, but the ERP system 1-200 isnot programmed to discover process bottlenecks. In fact, a typical ERPsystem 1-200 is ignorant of cause-effect or predecessor-successorrelations between the events in the logistic process 1-100.

Another problem relates to the fact that the number of individualprocesses (process instances) in a typical ERP system is huge. Each openor completed order is an instance of a process that differs from all theother process instances at some level of detail. On the other hand, ifall process instances are generalized to just the two end nodes (namelystart and end), all process end up being the same. The question, then,is how to generalize processes such that classes of processes begin toemerge, wherein certain classes of processes tend to exhibit variousproblems, such as long processing times, convoluted process flows, orthe like.

The question of discovering potentially problematic process classes isnot merely a question of obtaining cognitive information. In addition,there is clearly technical problem of how to perform the processanalysis (in the analysis system 1-300) with sufficient efficiency, suchthat interactive real-time analysis is possible. Let us assume, firstly,that all data describing the ERP process resides in the database of theERP system 1-200 and that data is to be analysed in the analysis system1-300. Those skilled in the art will realize that the amounts of datacan be enormous and the bandwidth of data communications between the ERPsystem 1-200 and the analysis system 1-300 prevent any kind ofinteractive real-time analysis wherein data has to be transferred inreal time between ERP system 1-200 and the analysis system 1-300.

Let us next assume that all data from the ERP system 1-200 is mirroredlocally in the analysis system 1-300, in a normalized mode wherein alldata items are stored exactly once. In such a system, it is thebandwidth between the database and server of the analysis system 1-300that precludes real-time analysis if any non-indexed database operationsspanning the entire database are needed. There is thus a motivation tocache some intermediate results to speed up the analysis. On the otherhand, if too much of the intermediate results are cached in the analysissystem 1-300, the problem is that any changes in the events of thelogistic process 1-100, or its model in the ERP system 1-200, render thecached intermediate results obsolete. Also conducting the analysisitself may result in excluding certain source data events, making itnecessary to re-calculate some or all cached values. The question, then,is what to cache and how?

A yet further problem relates to the fact that computer-implementedanalysis systems are effective in screening processes that, say, take alonger time to complete than what is considered normal for a process ofa given type. Yet computers have poor or no abilities to understand whya process takes an abnormally long time to complete. Accordingly, thereis need for an interactive user interface via which a human user canfocus on the problem spots in the logistic processes. Again, one shouldkeep in mind that in addition to the high-level cognitive problem ofwhat information should be provided to the human user, there areunderlying technical problems of how to make the computer-implementedanalysis system efficient enough such that interactive real-timeanalysis is possible. The combination of interactive usage and usage ofadvanced data mining and statistical analysis algorithms is speciallyadvantageous when utilizing this invention.

DISCLOSURE OF THE INVENTION

An object of the present invention is thus to provide a method, anapparatus and a computer program product so as to solve alleviate one ormore of the problems identified above.

The object of the invention is achieved by aspects of the inventions asdefined in the attached independent claims. The dependent claims and thefollowing detailed description and drawings relate to specificembodiments which solve additional problems and/or provide additionalbenefits. Some aspects of the invention relate to methods for processanalysis. Other aspects of the invention relate to computer systems forperforming process analysis. Yet other aspects of the invention relateto computer-readable media embodying program code the execution of whichin a computer system causes the computer system to carry out one or moreof the methods according to the invention.

The present patent specification relates to a group of relatedinventions or feature sets that can be used individually or incombination. Specifically, the present invention corresponds to thesecond feature set in the following list of feature sets. The remainingfeature sets or individual features from the remaining feature sets maybe used as embodiments of the present invention.

A first feature set of the group of inventions relates to automaticdiscovery of processes in imported event instance data wherein the eventinstance data does not explicitly identify any processes.

A second feature set of the group of inventions relates to interactivefiltering techniques by which a new process analysis can be made byusing the results from previous process analysis as configurationoptions for the new process analysis, ie the set of currently presentedprocesses and events can be dynamically altered.

A third feature set of the group of inventions relates to an optimizedcaching scheme which expedites analysis of the identified processes in aserver system, such as an SQL server.

A fourth feature set of the group of inventions relates to techniquesfor efficient identification and processing of categories of processes.

A fifth feature set of the group of inventions relates to techniques foranalysing the discovered process instances and making a prediction andsuggestion based on the analysis.

Each of the feature sets can be embodied as methods, computer systems orcomputer-readable media carrying computer program products.

It will be apparent that each of the first through fifth feature setssolve one or more technical problems relating to processing efficiency,data security or the like. Improved processing efficiency and/or datasecurity enable users to perform interactive real-time analyses on thereal-world processes supported by the one or more information managementsystems. For instance, attempts to understand, analyse and improve thecurrent processes include a variety of the following challenges:

-   -   Actual business processes contain multiple different variations.    -   Understanding of the current as-is situation, which is necessary        for developing processes and improving quality, is difficult        because prior art information management systems provide little        or no useful information pertinent to cause-effect relations        between events.    -   Large amounts of detail-level information need to be captured in        order to identify root causes for problems and to prioritize        solutions.    -   Prior art information management systems provide little or no        useful information pertinent to automated discovery and        reverse-engineering of business processes.        -   There is thus a need to create understanding of the as-is            situation, whereby analyses and actions can be based on            facts.        -   There is a need to detect information that allows            benchmarking of processes within organizations.        -   Similarly, there is a need for direction to improvement            efforts.        -   Any analysis of business processes should be reproducible            whereby changes can be verified.    -   Examples of business scenarios:        -   An organization may have gone live with an ERP system during            the past year and is unsatisfied with the current situation.        -   The organization wants to improve its understand of the            as-is situation in order to focus development work to the            most important issues.

Some important results from an analysis of an Order-to-Cash processinclude:

-   -   Fact-based illustrations of Order-to-Cash process:        -   Documentation of the way how ERP system is used (IT            perspective).        -   Service levels and delivery times (customer perspective).        -   Process variation and needs for improvement (process            perspective).    -   Identification of root causes for not meeting the delivery        times.    -   Quick wins and other “clues” for further analysis.    -   Process analysis increases understanding between IT and        business. It prepares the ground for their joint development        work in the future.    -   The analysis confirms several challenges, both small and large.        With large business volumes relatively small issues become major        problems.    -   Identifies important process measures and provide performance        data.

The first feature set of the invention, which relates to automaticdiscovery of processes in imported event instance data, wherein theevent instance data does not explicitly identify any processes, can beembodied as a method in the following manner. The method steps arelabelled a) through f) merely to facilitate discussion and not torestrict the order of execution of the steps:

A method for analyzing information derived from event data by acomputer-implemented analysis system, which comprises a server and oneor more clients, wherein the event data describes a real-world processthe execution of which is supported by at least one informationmanagement system but the real-world process is not directly connectablewith the computer-implemented analysis system, the method comprising thefollowing acts performed by the server:

-   -   a) importing event instance data comprising a plurality of event        instance data sets from the at least one information management        system, wherein each event instance data set comprises one or        more attributes describing an event instance in the real-world        process;    -   b) determining for each imported event instance data set a        corresponding process instance based on at least the attributes        of the imported event instance data set;    -   c) determining at least one event order attribute for each        imported event instance data set based on at least other event        instance data sets corresponding to the same process instance;    -   d) forming an analysis result set based on at least the event        instance data sets and at least one event order attribute;    -   e) sending the analysis result set to one or more clients;

The method further comprises:

-   -   f) at the one or more clients, presenting an analysis utilizing        the analysis result set.

The introductory portion of the method reflects a technical problemunderlying the invention, namely the fact that the event data describesa real-world process the execution of which is supported by at least oneinformation management system but the real-world process is not directlyconnectable with the computer-implemented analysis system. Therefore anydetection and observation of meaningful processes must take placeindirectly, by importing event instance data from theinformation-management system. For the sake of clarity and brevity, theinformation-management system supporting the real-world process will becalled an ERP system, whether or not it actually meets any formalcriteria for ERP systems.

In the context of the present invention, importing event instance data,which comprises event instance data sets from the information managementsystem(s) to the analysis system means that the server of the analysissystem receives the event instance data regardless of whether the serverof the analysis system or an external entity initiates such importing.In some implementations, the server of the analysis system may performdata mining operations to the database of other information managementsystem. Alternatively or additionally, one or more external entities mayproactively relay event instance data to the server of the analysissystem. For instance, the event instance data sets may include anidentifier for CustomerID and the external data may provide moreattributes for each customer based on the CustomerID identifier found inthe event instance data set. Such external data may also contain similardata concerning the discovered process instances, ie, the identifiedprocess instances may be discovered from event data, and then theexternal data may be used to provide more information on the identifiedprocess instances. Such external data can be utilized to describe eventinstance data sets, process instances, event types and flow instancetypes, for example. As used herein, a flow instance means a transitionbetween two event instances. Correspondingly, a flow instance type meansa transition between two event types.

The event instance data imported from information management system,such as the ERP system, does not explicitly relate to any meaningfulprocesses, or if it does, the definition of processes for the purposesof the ERP system may not be applicable for the process analysis.Accordingly, step b) comprises determining for each imported eventinstance data set a corresponding process instance based on at least theattributes of the imported event instance data set. Attributes are anestablished term in the field of object-oriented modeling andprocessing. By way of example, attributes may store information relevantto the order of events, such as time stamps, or they may indicateresources or other events relating to the events.

In addition to the attributes of the imported event instance data set,the determination of the corresponding process instance may be based onexternal data. For example, in an order-to-cash process analyzed from acustomer order perspective, a process may include a customer orderidentified as “Customer order 543”, and for that customer order adelivery event “Delivery 23”, that is directly connected to a customerorder. The event data may then include a sub-delivery event“Sub-delivery 76”. It may be that the event instance data set for“Sub-delivery 76” does not include the information about the endcustomer order, but instead it includes information about the deliveryit belongs. In this case, the event cannot be linked directly to the“customer order 543” by only using attributes of the attributes in eventinstance data set “Sub-delivery 76” so one must first identify thecorresponding delivery event (based on info “sub delivery 76”) and thenidentify the corresponding Customer Order. The information of “Delivery23” is then used as external data for this particular event instancedata set “Sub-delivery 76”

As another example of determining an event type based on information inevent instance data set and external data, a patient handling processwill be described next. Event instance data set for a patient handlingprocess may include a activity code for the operation performed by adoctor. In the event instance data set, the code may be presented in avery detailed way so that there are thousands of different codes. Itmight be desired for the analysis purposes to categorize the codes sothat there is only 10 different categories into which the codes belong.Now for each event instance data set, we first identify the code fromthe data set, then use external mapping table for identifying thecorresponding group and then use this group as the event type value.

Step c) comprises determining at least one event order attribute foreach imported event instance data set based on other event instance datasets corresponding to the same process instance. In a normal environmentin which the invention is used, events within the ERP system areseparate from each other. According to the first feature set of theinvention, the second attribute typically indicates an order or sequencefor the event instances for which a common process instance has beendetermined. For instance, the second attribute(s) for an event instancemay indicate a successor (and/or predecessor) of the event instance, orit may be a variable having as a value a timestamp of the event.

For the purposes of process analysis, step d) comprises forming ananalysis result set based on at least the event instance data sets andat least one event order attribute, while step e comprises sending theanalysis result set to one or more clients. In step f) the one or moreclients present an analysis utilizing the analysis result set.

Optionally, the first feature set may comprise determining (eg bycalculating) at least one first attribute for each identified processinstance based on at least the event instance data sets corresponding tothe identified process instance. In an illustrative case, such a firstattribute may comprise the number (or value) of sales over a period oftime, for a product, for a customer or for a salesperson, or a group ofproducts, customers or salespersons.

As used herein, the analysis result set is a set of data that istypically compiled from a large collection of data. But the analysisresult set may not be in a form that is understandable to humans orother clients. Accordingly, the act of presenting the analysis on thebasis of the analysis result set comprises converting the analysisresult set to a format that is accessible to the client(s). In anillustrative but non-restrictive implementation the act of presentingthe analysis result set comprises presenting the analysis result setgraphically.

For example, the analysis result set may be formed based on first orsecond attribute(s). If, say, the first attribute indicates a durationof processes and the second attribute indicates a successor for eachevent, the analysis result set may comprise different sets of eventsbetween common starting and ending events. This kind of analysis, whichbelongs in a broad class of analyses called benchmarking, is helpful inidentifying which of the processes proceeds from the start to the end inthe least amount of time or with the lowest number of errors orcomplaints.

Another technical problem underlying the invention is that processing ofthe event instance data should be fast enough so that interactiveprocess analysis is meaningful. In a preferred implementation of theinvention, all method steps, which involve accessing large amounts ofdata, are performed in the server, and only the presentation of theanalysis, which is based on a relatively compact set of data, isperformed in the client(s). In some implementations the server may be adatabase server, such as an SQL server, which has a fast access to anSQL database. As a result, only the presentation step needs to beperformed in workstations, which typically have slower interfaces to theserver or database. Instead of the fast access to the database, or inaddition to it, the feature of performing most of the method steps inthe server may produce other benefits. For instance, the server-basedimplementation may improve data security as the client(s) do not gainaccess to the entire database. In some implementations the server maystore sensitive data belonging to entities other than the one performingthe process analysis, and all the sensitive data remains hidden from theclients. Finally, software installation is relatively easy because theclients only need a simple interface, such as a web browser. For thesame reason, processing load imposed on the clients is low. Also apowerful server can serve a very large amounts of clients using becauseall the clients typically do not use the system at the same time. Servercan also have very large data storage capacity. One good way is todeliver the server as a cloud based service serving clients in a verylarge geographical area.

The second feature set of the invention, which relates to interactivefiltering techniques by which a set of currently presented processes canbe dynamically altered, can be embodied as a method in the followingmanner. The introductory portion of the method may be similar to that ofthe first feature set and is omitted for the sake of brevity. Again,labelling of the method steps does not restrict the order of executionof the steps. The method comprises the following acts performed by theserver:

-   -   importing event instance data comprising a plurality of event        instance data sets from the at least one information management        system, wherein each event instance data set comprises one or        more attributes describing an event instance in the real-world        process;    -   determining for each imported event instance data set a        corresponding process instance based on at least the attributes        of the imported event instance data set;    -   determining at least one event order attribute for each imported        event instance data set based on other event instance data sets        corresponding to the same process instance;    -   forming an analysis result set based on at least the event        instance data sets and at least one event order attribute;    -   sending the analysis result set to one or more clients; the        method further comprising:    -   the one or more clients presenting an analysis utilizing the        analysis result set, and in response to receiving an input that        is related to the analysis, sending a request to the server;    -   the server forming filtered event instance data by excluding        event instance data sets from the analysis based on the input        received from the one or more clients;    -   repeating at least the above steps of forming the analysis        result, sending the analysis result and presenting a revised        analysis based on at least the filtered event instance data.

As stated in connection with the first feature set, a client-serverarchitecture provides certain benefits relating to efficiency, datasecurity, ease of installation and minimizing the burden on the clients.Similar benefits may be obtained in connection with the second featureset by performing the bulk of processing steps at the server andperforming only the presenting steps at the client.

A benefit of this feature set is that it facilitates entry of parametersfor the analysis. It is normally very difficult to give parameters forthe analysis. In a typical scenario the original set of events mayinclude a large number of events and process instances that must beexcluded from the analysis so that the remaining set of events describesa meaningful set to the client. Interactive filtering according to thefifth feature set of the invention may use results like “duration ofprocess instance”, “duration between particular events for the process”,“number of particular events for the process”, “processes belonging to aparticular process variation”, “processes including particular eventsbased on other event attributes” or “processes based on certain processattributes”. The analysis system may present the results to a client,letting the client to make a selection directly in the results itself,reconfiguring the source data so that the analysis is repeated with anew set of data. This is very logical for the client and leads to newanalysis sets that are derived from each other. Also client may add newevents to analysis making it possible to first analyze a smaller subsetof data and then extending the analysis to a larger data set.

For instance, the creation of the subset may involve including onlycertain user-selected events and/or processes in the analysis resultset, or excluding them from the analysis result set. The user mayperform an initial analysis and use the results of the initial analysisto select subsets for further analyses.

In some implementations, each new process analysis set is stored in adatabase, possibly with some additional information. In oneimplementation, named “views”, which are essentially definitions forsets of event instance data sets, may be stored in the database forlater analysis. In one implementation, specific users may get accessrights only to some specific sets.

The stored view may include analysis parameters usable for recreating aspecific analysis result set. This may mean, for example, that each viewindicates what event instance data to include in the analysis, while theparameters of the view indicate which analysis report and which analysisreport parameters are referred to.

The third feature set of the invention, which relates to an optimizedcaching scheme which expedites analysis of the identified processes in aserver system, can be embodied as a method in the following manner. Theintroductory portion of the method may be similar to that of the firstfeature set and is omitted for the sake of brevity. Labelling of themethod steps facilitates discussion but does not restrict the order ofexecution of the steps:

-   a) importing event instance data comprising a plurality of event    instance data sets from the at least one information management    system, wherein each event instance data set comprises one or more    attributes describing an event instance in the real-world process;-   b) determining for each imported event instance data set a    corresponding process instance based on at least the attributes of    the imported event instance data set;-   c) determining at least one event order attribute for each imported    event instance data set based on other event instance data sets    corresponding to the same process instance;-   d) calculating order information for each event instance data set so    that for each event instance data set, an unambiguous and unique    predecessor event and successor event can be deduced based on the    order information and process instance identifier;-   e) storing (eg caching) the calculated order information in the    server;-   f) forming an analysis result set based on at least the event    instance data sets and the calculated order information;-   g) sending the analysis result set to one or more clients.

At the one or more clients, an analysis is presented utilizing theanalysis result set.

Again, some of the steps may be similar to those of the previous featuresets. Similarly to the first and second feature sets, a preferredimplementation involves performing the majority of the steps in theserver, and particularly all steps involving massive databaseoperations, while only the presenting step is performed in theclient(s).

In the calculating step d) the unambiguous and unique predecessor andsuccessor events are other event instance data sets for the majority ofevent instances. For the first and last events, however, the predecessorand successor events, respectively, are empty so that it can beunambiguously seen that they indeed are the actual starting and endingevents.

According to an optional feature, when the client sends the server arequest to update the analysis result set, the server re-uses thepreviously created order information to speed up the calculations. Forinstance, if the process analysis system is coupled to an activeinformation management system in which new event instances are enteredcontinually, utilization of the cached calculated order information inthe server permits updating of the analysis result set with sufficientspeed so that interactive process analysis is possible. Also theinformation may be re-used to make the feature set 3—interactivefiltering—faster to execute. Also the information may be re-used increating multiple analysis result sets from the same analysis data.

In some implementations, the server may store the calculated orderinformation, wherein for every event instance data set, the cachedinformation indicates at least the calculated order information and adatabase id for the successor or predecessor event instance data set.For instance, a specific order number, such as 1, may define the startevent for a process instance, while the successor instance data set=NULLmay define the end event for a process instance, or vice versa, suchthat a specific order number (eg −1) defines the last event for aprocess instance, while the predecessor instance data set=NULL definesthe start event for the process instance. For the sake of clarity, mostexamples are described in such a manner the order=1 specifies the startand successor=NULL defines the end event.

Alternatively or additionally, the server may store the calculated orderinformation, wherein for each event instance the stored orderinformation includes at least one attribute copied from the successorevent for the timestamp of the event or event type or the event. Theserver may optionally use previously stored order information whencalculating new order information for a new analysis based on a new setof event instance data sets. This feature is specially advantageous whenthe user is using the filtering capabilities of the second feature setand the set of event instance data sets is a subset of a larger set ofevent instance data sets for which the order information has alreadybeen created. This is advantageous also in situations when one or moreevent instance data sets are added to the analysis system which alreadyincludes existing event instance data sets for which the orderinformation has been calculated and stored.

The fourth feature set of the invention, which relates to techniques forefficient identification and processing of categories of processes, canbe embodied as a method in the following manner. The introductoryportion of the method may be similar to that of the first feature setand is omitted for the sake of brevity. Again, labelling of the methodsteps does not restrict the order of execution of the steps:

-   a) importing event instance data comprising a plurality of event    instance data sets from the at least one information management    system, wherein each event instance data set comprises one or more    attributes describing an event instance in the real-world process;-   b) determining for each imported event instance data set a    corresponding process instance based on at least the attributes of    the imported event instance data set;-   c) determining at least one event order attribute for each imported    event instance data set based on other event instance data sets    corresponding to the same process instance;-   d) calculating order information for each event instance data set so    that for each event instance data set, an unambiguous and unique    predecessor event and successor event can be deduced based on the    order information and process instance identifier;-   e) determining for each process instance an ordered list of related    event instance data sets based on the order information of event    instance data sets;-   f) calculating process variation information for each process    instance based on at least on attribute of each event in the ordered    list of related event instance data sets;-   g) storing (caching) the calculated process variation information in    the server;-   h) forming an analysis result set based on at least the event    instance data sets and the calculated process variation information;-   i) sending the analysis result set to one or more clients; and-   j) at the one or more clients, presenting an analysis utilizing the    analysis result set.

Again, steps a) through d) may be similar to those of the previousfeature sets, and step e) can be similar to that of the third featureset. Likewise, the two last steps can be similar to the two last stepsof the first feature set. Similarly to the first and second featuresets, a preferred implementation involves performing the majority of thesteps in the server, and particularly all steps involving massivedatabase operations, while only the presenting step is performed in theclient(s).

In some implementations the server may store (eg cache) the calculatedprocess variation information so that for every process instance thestored process variation information includes at least the ordered listof event type identifiers of the event instance data sets connected tothe process instance. The server may store the calculated processvariation information in such a manner that the ordered list ofidentifiers is stored in a single database attribute for the processvariation so that it can be referred with database functions. Forinstance, the database function may be a string processing functionhandling regular expressions.

For instance, the server may calculate and store a hash-like variable,which is calculated based on the process variation in such a manner thatfor each new process instance the server can effectively search andidentify an already existing process variation or create a newvariation. As used herein, “a hash-like variable calculated based on theprocess variation” means that the variable is calculated by means of afunction that specific features of hash functions, but not necessarilyall of them. Specifically, the hash-like function should provide twofeatures. A first desirable feature is strong lossy compression from theinput space to the output space. As a result, even extremely longprocess variations are compressed to a bit or character string that issearchable by the string processing functions of the server. Because ofthe lossy compression, the other desirable feature is approximately evendistribution of output variables regardless of the distribution of theinput variables. As a result of the lossy compression, there are severaldifferent input variables (descriptors of process variations) that arecompressed to the same output variable. A benefit of the evendistribution of the output variables is that the entire output variablespace is used evenly. This feature reduces the risk of having specificoutput variables that correspond to huge numbers of different inputvariables.

Still further, the server may use previously stored variationinformation when calculating new variation information for a newanalysis based on a new set of event instance data sets. This feature isadvantageous when the user is using the filtering capabilities offeature set 2 and the set of event instance data sets is a subset of alarger set of event instance data sets, for which the variationinformation has already been created. This feature is also advantageousin cases wherein one or more event instance data sets are added to theanalysis system, which already includes existing event instance datasets, for which the variation information has been calculated andstored.

A distinctive feature of the fourth feature set is the calculation ofthe process variation information for each process instance. In onespecific implementation the process variation information is based onthe event types of the ordered set of event instance data sets. Thismeans that the process variation is defined by the chain of event types(or event classes) traversed by the process. Any two or more processesin the same process variation if the processes have exactly the samesets of event types in exactly the same order. In other words, anyprocesses within a single process variation may only differ from eachother in respect of the detail level of event instance data sets, butthe event types must be the same and in the same order. For example,consider the following events:

ProcessInstance Activity Person Timestamp 01 Create John 1 Jan 2012 01Modify Mary 2 Jan 2012 01 End John 3 Jan 2012 02 Create John 4 Jan 201202 Modify John 5 Jan 2012 02 End John 6 Jan 2012

When the process variation is calculated for the event instance data setattribute “Activity”, then a possible process variation for processinstances 01 and 02 can be “Create, Modify, End” and “Create, Modify,End”, respectively. If the variation is calculated for attribute“Person”, then a possible process variation for process instance 01 canbe “John, Mary, John” and for the instance 02: “John, John, John”.

According to an optional feature, when the client sends the server arequest to update the analysis result set, the server re-uses thepreviously created process variation information to speed up thecalculation. Again the calculated information may be re-used to make thesecond feature set—interactive filtering—faster to execute. Also theinformation may be re-used in creating multiple analysis result setsfrom the same analysis data.

The fifth feature set of the invention, which relates to techniques foranalysing the discovered process instances and making a prediction andsuggestion based on the analysis, can be embodied as a method in thefollowing manner. The introductory portion of the method may be similarto that of the first feature set and is omitted for the sake of brevity.Again, labelling of the method steps does not restrict the order ofexecution of the steps:

importing event instance data comprising a plurality of event instancedata sets from the at least one information management system, whereineach event instance data set comprises one or more attributes describingan event instance in the real-world process;

-   -   importing event instance data comprising a plurality of event        instance data sets from the at least one information management        system, wherein each event instance data set comprises one or        more attributes describing an event instance in the real-world        process;    -   determining for each imported event instance data set a        corresponding process instance based on at least the attributes        of the imported event instance data set;    -   determining at least one event order attribute for each imported        event instance data set based on at least other event instance        data sets corresponding to the same process instance;    -   creating a causal model based on at least the event instance        data sets;    -   using the causal model to calculate a probability for at least        one predicted future event not included in the data for at least        one process instance based on at least the causal model.    -   forming an analysis result set based on at least the event        instance data sets and at least one predicted future event;    -   sending the analysis result set to one or more clients; the        method further comprising:    -   at the one or more clients, presenting an analysis utilizing the        analysis result set.

Again, some of the steps may be similar to those of the previous featuresets. Similarly to the first and second feature sets, a preferredimplementation involves performing the majority of the steps in theserver, and particularly all steps involving massive databaseoperations, while only the presenting step is performed in theclient(s).

The causal model may be created based at least partially on linearcorrelations between event types derived from event instance data setsimported to the system. Alternatively or additionally, the causal modelmay be created based at least partially on the basis of separatelyimported general prediction data.

In some implementations, the server may calculate the probability for anevent of a particular event type. Alternatively or additionally, theserver may calculate the probability for an event of a particulartimestamp. Yet further, the server may calculate the probability for aprocess instance to match any particular process variation. The servermay calculate some or all of these probabilities based on at leastinfluence data that is derived from event in-stance data sets importedto the system.

The server may further calculate the probabilities based on at least oneadditional event of a certain event type imported to the analysissystem. In one illustrative example, the server may determine thatactivity X maximizes the probability for a desired future event andsuggest that a salesperson performs activity X. A distinctive feature ofthe fifth feature set is that the probability of a predicted futureevent is calculated for at least one process instance. This comprises amethod wherein the system is used to predict probabilities for certainevents for process instances. For example a sales process analysissystem may calculate the probability of a sales case of becoming Won orLost based on the prior events loaded into the system for other salescases and this particular sales case. The analysis system can thenprovide answers to a number of questions, examples of which followshortly. In the list of exemplary questions, the term “status of aprocess instance” refers to a chronologically latest value of a givenevent type attribute for a process instance. For example, the status ofa sales case can be regarded as the value of event attribute “status”recorded in event instance data sets whose event type is“StatusChanged”. The process instance may have a variety of events, andfor determining the Status of the process instance, the analysis systemfirst filters out all other event instance data sets except the oneswhen the event type is “StatusChanged”. Then the analysis system locatesthe last ones of those event instance data sets separately for eachprocess instance and defines that as the status for the processinstance. Prediction about future status then means that in the future,there will occur a new event whose event type matches our definition ofstatus. In other words, the event type may be “StatusChange” and thevalue in the “status” attribute would be the predicted value. Thefollowing is a non-exhaustive list of possible predictions that can beperformed by an embodiment of the invention that implements the fifthfeature set:

-   -   How likely it is for the “sales case X” currently in a status of        “Offer Sent Out” to reach a status of “Customer Purchase”?        Answer could be a probability percentage in a range of 0% to        100%    -   How likely it is for the “sales case X” currently in a status of        “Offer Sent Out” to reach a status of “Customer Purchase” by the        last day of this quarter? Answer could again be a probability        percentage in a range of 0% to 100%    -   What would be the likelihood of “sales case X” currently in a        status of “Offer Sent Out” to reach a status of “Customer        Purchase” during a certain month within the next 3 year period        given the assumption that the “sales case X” will reach that        status? Answer could be a list of probability percentages in a        range of 0% to 100% for all future months for 3 years and then        maybe a leftover probability of the status being reached outside        the 3 year period.    -   Given a possibility of adding a new event of at least one event        type to the event instance data set for a given period for at        least one process instance, what event type would have a best        effect in raising the probability of a process instance reaching        a desired goal status. Answer could be for example that “adding        event of type ‘e-mail sent to customer’ would increase the        probability of sales case X of reaching the customer purchase        status from 23% to 45%.”    -   Prediction could also be used for all open sales cases, ie,        sales cases currently having a certain status, such as “not        closed”. In this scenario the system could show all cases for a        given salesperson, and assuming that the salesperson was able to        write an e-mail message, then the analysis system could predict        how big an effect the e-mail message may have on the probability        of any of these sales cases to reach a “customer purchase”        status within the current quarter. With this information, the        salesperson could then write the e-mail to the customer in which        case the effect is maximal.    -   The salesperson of the previous example may also be able to        perform any of a number of other activities, ie, to conduct a        real life activity that may produce an event instance data set        with a particular event type and attribute values. In one        illustrative example, sending an e-mail requires two hours,        making a phone call requires one hour, sending an offer to a        customer for whom a pre-study has been made requires four hours        and making a personal visit to the customer takes eight hours.        Considering this situation, the analysis system can suggest an        optimal way to spend the next two business days, or 16 business        hours, given the current status of process instances. This mode        of utilizing the invention helps maximize the number of process        instances reaching a “customer purchase” status during the        on-going quarter.    -   Moreover, in all of the above examples, the increase in        percentages of the possibility to reach a certain status may be        further used to calculate the business effect of case obtaining        this status. This may mean that an increase from a 23%        probability to a 45% probability for a case with a business        value of $10.000 would mean an increase of 22%×$10,000=$2,200        value for the business. In this way the analysis system can        suggest for any given set of process instances and any given set        of possible actions that may be performed an optimal set of        actions that should be performed (which actions to which process        instances by which person) in order to maximize the business        outcome of the processes.    -   As a still further example, the user may request the analysis        system to give an estimate for any particular case. The analysis        system may also assist the salesperson in selecting a most        beneficial activity, ie, the activity and the process instance        the salesperson should do next to maximize their sales pipeline        value and the probability of sales cases obtaining a “won”        status. Such an analysis may be performed automatically, in such        a manner that the system performs an activity when a certain        result appears in the analysis results.

Calculations in the fifth feature set may be performed using thefollowing techniques:

-   -   Constructing a causal model to find correlations between any        given status and given predicted status. The model is preferably        constructed in such a way that the Event Instance data sets are        not independent. This means that the model takes into account        each event instance data set corresponding to the process        instance and uses all these past events in finding correlations        and influencer information.    -   The causal model could use, possibly which different weightings        based on process instance attributes, information from all        process instances. For example sales cases where the salesperson        is same may have a bigger effect than sales cases, ie. process        instances, where the salesperson varies. Also process instances        belonging to same organizational or geographical unit might have        a bigger influence than process instances from different units.        The same methodology could be used for process instances where        the same product or product from the same product group has been        sold. And yet another example may be based on same or similar        customer.    -   The causal model may also use benchmarking data from others        customer. For example a cloud-based customer relationship        management system, like “salesforce.com”, may contain data for        multiple organizations in the system. Organizations may provide        their sales process data as a benchmarking data for constructing        a causal model so that similarities in other organizations,        possibly from same industry or same geographical location of        same customer, or somehow same business, may be shared.

The reader is reminded of the fact that the problems underlying theinvention are a mixture of cognitive and technical nature. Although theargument can be made that detection of the business processes andunderstanding of the causal and other relations is a cognitive problem,the technical problems relate to performing the analyses with sufficientspeed so that interactive process analysis is meaningful. This in turninvolves the problem of how to perform the various detection andcalculation steps in the server, in view of the fact that typical serversoftware do not provide built-in functions for such purposes.

Optional Features for all Feature Sets

Some optional features, which can be combined with any of theabove-identified first through fourth feature sets, will be describednext.

For the purpose of importing event instance data sets: multiple eventinstance data sets can be created from a single event instance data setin the actual source data from external system. For example a real worldevent may have attributes like “event started” and “event completed”.Part of this invention is that for the sake of the analysis it may beadvantageous to break each available timestamp value of each event intoa separate event instance data set. By using this technique a concept ofparallel events may be managed so that each event itself has no durationand that there is no overlapping events. This makes the analysis muchfaster while still preserving full capabilities for analysing parallelreal-world events with individual start/stop/continue/halt/waiting kindof atomic events for each real world event. Also when constructing theorder information it may be useful to utilize external information like“event instances of event type ‘first aid started’ should always beplaced before an event of type ‘first aid test 1’” even though bothevents would have exactly the same timestamp value. This means that whencalculating for example the process variations these two events willcause less variations in case there exists many process instances havingboth event types with same timestamp. Also the external orderinformation may be deducted from the analysis itself, ie. If Event Xtypically occurs before Event Y then the analysis system may use a ruleto place Event X always before Event Y in case they both happen to havea same timestamp for events in an individual process instance.

An optional feature of the invention involves processing the eventinstance data sets that result from real-world event in a way that eachevent imported to the system has exactly one timestamp value indicatingthe actual moment in real time when the event occurred. For each eventcorresponding to the same process instance, the timestamp value isdifferent. In the real world this is not always true, since in the realworld there may be events that occur exactly simultaneously. Moreover,typical real-world events may have multiple timestamps, such as a starttimestamp and finish timestamp, whereby multiple events may occursimultaneously. For example a patient in a hospital treatment processcan visit a hospital in a “ward stay” that takes three days, and duringthat ward event there can be a “discussion with doctor” event that takes20 minutes. During that discussion, the doctor may measure bloodpressure at a certain timestamp x. In this situation the real worldevents like “ward stay” and “discussion with doctor” are divided intosub-events such as “ward stay start”, “ward stay finish”, “discussionwith doctor start” and “discussion with doctor finished”. Thesesub-events are then assigned a unique timestamp within the processinstance. A benefit of the unique timestamp is that the analysis systemmay obtain unambiguous order information for all events being importedinto the analysis system. In some applications, it may be beneficial toartificially force ordering rules in such a manner that there is awell-defined fixed order. From the point of view of the analysis systemthe benefit is that the fixed order enables use of algorithms that areefficient with very large data sets. For human users the fixed orderfacilitates comprehension of the analyses.

For the purposes of duration analysis, at least one date-time attributemay be determined for each event instance data set based on at least theinformation contained in the event instance data set. Optionally, atleast one duration attribute may be calculated for each process instancebased on at least the difference between the date-time attribute of thechronologically last event instance data set and the chronologicallyfirst event instance data sets, wherein the last and first eventinstance data sets correspond to the process instance in question. Forinstance, the duration analysis may further comprise determininginformation on the total duration of all process instances. According toa further optional feature, the process instances may be divided intocategories based on the value of the duration for each process instance.For example, the categories may include seconds, minutes, days, weeks,months, or years. According to another further optional feature, theduration analysis may include generation of a table or graph, whichshows the discovered categories and the numbers of process instancesbelonging to each category.

Further useful analyses may be based on the detection of event types.One optional feature comprises determining a value for at least oneevent-type category for each event instance data set based on at leastthe information contained in the event instance data set. Later in thispatent specification, an event instance data set is said to belong in acertain event-type if the value of the event-type attribute (of theevent instance data set) that was selected for the analysis is the sameas the value of the event-type. For some analysis types, the event typesmay be based on actions performed, while for other analysis types theevent types may be based on a person or resource involved in the action.As a still further example, the event type may be based on patientdisease diagnosis code. Yet further, an event type may be a combinedvalue. For example, a combined event type value in an order-to-cashprocess may be based on the action performed and the person performingthe action. Consider the following example:

ProcessInstance Activity Person Timestamp 01 Create John 1 Jan 2012 01Modify Mary 2 Jan 2012 01 End John 3 Jan 2012 02 Create John 4 Jan 201202 Modify John 5 Jan 2012 02 End John 6 Jan 2012

The above feature may be modified such that the analysis includesinformation on the occurrences of at least one determined event-typeattribute value. For example, the event-type attribute value mayindicate one or more of the following: 1) total number of event instancedata sets per each unique value of event-type attribute; 2) total numberof process instances that are linked to at least one event instance dataset having a certain event-type attribute value; 3) relative occurrence(using Timestamp from A and Name from B) showing average relativeoccurrence of every event instance data set corresponding to a certainevent-type in relation to all event instance data sets separately foreach process instance; and 4) relative occurrence variation for theabove mentioned average.

For performing case table analysis, any of the above-identified featuresets may be complemented with a feature of calculating at least oneevent-type-amount attribute is calculated for each process instance, sothat the value of a particular event-type-amount attribute is equal tothe number of event instance data sets of the particular event-typecorresponding to the process instance. Optionally, the case tableanalysis may include information on at least one event-type attributefor at least one process instance.

According to another optional feature, each process instance may becategorized into a process variation on the basis of the processinstance data and attributes, as well as the event instance data setinformation and attributes. The process variation for a process instancemay be formed by first ordering the event instance data sets by thederived date-time attributes in a chronological order and then takingthe event-type attribute values from each event instance data set. Whatthis means is that if any two process instances have exactly the samenumber of events, and when the events are ordered by their date-timeattributes, the event-types of the events are exactly the same and inexactly the same order for both process instances. Optionally, someevent instance data sets may be excluded by a rule set when calculatingthe process variation. The rule set may be permanent, semi-permanent ordynamically alterable via the user interface.

According to a further optional feature, some additional calculatedevents may be introduced based on information derived from the attributevalues, such as the duration between two events or the number ofrepeating events having the same event-type. Alternatively oradditionally, the process variation for a process instance may be formedby calculating the number of event instance data sets belonging in theprocess instance. Alternatively or additionally, the process variationfor a process instance may be formed by calculating the number of uniqueevent instance data sets belonging in the process instance, so that allthe event instance data sets having the same event-type value arecounted as a single event when calculating the total number of uniqueevents. Optionally, the analysis may further comprise determination ofone or more of the following:

-   -   information on the occurrences of process instances for each        discovered process variation, or    -   number of the occurrences of process instances for each        discovered process variation, or    -   relative number (eg a percentage) of the occurrences of process        instances for each discovered process variation per the total        number of process instances, or    -   any of above but the result set is limited to a maximum of X        process variations.

Process analysis may be further enhanced by one or more of the followingfeatures. One optional feature comprises determining a flow instance,for each two consecutive event instance data sets for any processinstance. The attributes of the flow instance may include a predecessorevent and/or a successor event. For example, the two consecutive eventinstance data sets may belong in same process instance, and when allevents belonging to the same process event are ordered by the date-time,so that each event has a unique date-time, there are no events fallingbetween these two events. In other words, these two events are apredecessor or successor events for each other.

According to a further optional feature, new attribute values may becalculated for each flow instance. The new attributes may include one ormore of the following: 1) predecessor date-time; 2) successor date-time;3) duration (which may be calculated as successor date-time minuspredecessor date-time; 4) cost (which may be calculated as a minimum ormaximum or average of the predecessor and successor events).

Yet further, for each discovered flow instance, a corresponding flowtype may be determined by combining the event type of the predecessorevent instance data set and the event type of the successor eventinstance data set, wherein the event type attributes include at leastthe event type of the predecessor event instance data set and the eventtype successor event instance data set.

Alternatively or additionally, new attribute values may be calculatedfor each flow type. The new attributes for flow type may include one ormore of the following:

-   -   Total number of flow instances corresponding to the flow type        (=all occurrences);    -   Total number of process instances that include a flow instance        that belongs to this flow type (=unique only);    -   Average duration of flow instances belonging to this flow        type+the standard deviation;    -   Median duration of flow instances belonging to this flow type;    -   Total cost and average cost of a cost attribute for each flow        instance belonging to this flow type; and/or    -   Weighted average duration, wherein the duration of each flow is        weighted with the cost of each flow when calculating the average        duration.

Alternatively or additionally, new attribute values may be calculatedfor each event type. The new attributes for event type may include oneor more of the following:

-   -   Total number of event instances corresponding to the event type        (=all occurrences);    -   Total number of process instances that include an event instance        belonging to this event type (=unique only);    -   Average duration of event instances belonging to this event        type+the standard deviation, calculated from two selected        date-time values of each event instance data set;    -   Median duration of event instances belonging to this event type    -   Total cost and average cost of a cost attribute of each event        instance belonging to this event type; and/or    -   Weighted average duration where the duration of each event is        weighted with the cost of each event when calculating the        average duration.

A still further optional feature of the invention comprises creating ananalysis report showing information on event instance data sets, processinstances, flow instances, event types and flow types. The analysisreport may show such information graphically and/or numerically.

Creation of the analysis report may further include drawing a symbol foreach discovered event type and/or drawing a directed connector symbolthat connects each two event types in a case wherein there is a flowtype from a predecessor event type to successor event type. The analysisreport may further include at least one attribute value of event type tothe symbol representing event type. For example, the attribute value mayindicate a name, duration, quantity (number or amount) of something, orthe like. Alternatively or additionally the analysis report may includeat least one attribute value of flow type to the symbol representingflow type. The examples for flow type attributes may be similar to theexamples for event type attributes.

A still further optional feature of the invention comprises a methodwherein new events are added to the analysis. This adding can be done byimporting several events in batches or it can be done by receivingindividual events whenever they become available. For example a RFIDreader may notice an event related to a delivery of a particular productwhenever the product is loaded into a ship. The RFID reader may thensend this event to an RFID server, which in turn sends the event withsome event attributes to process analysis server. In the analysisserver, the adding of one event may then cause a change to analysisresults, cached information about event orders and process variation,and for example an alert message being sent. Also the adding of newevent may result in information related to that particular event to besent and maybe also shown in client terminals.

Yet another optional feature of the invention comprises a method whereinfiltering is done by using a parameter for the analysis. This parametercan be for example the relative amount of process instances having anevent of a particular event type compared to the total amount of processinstances. For example the event type “Offer Changed” may occur in 7% ofthe process instances in sales process. There could be for example avisibility parameter such that “only show event type” in case therelative occurrence is more than 10%, in which case the event type“Offer Changed” would not be visible. In case the parameter is 5% thenthe “Offer Changed” event type would be visible. Another example islimiting the depth of a graph, ie, for example only show maximum of 5levels in a process chart starting from a given event type.

Yet another feature of the innovation is a support for benchmarking.Benchmarking can be done for example by first creating at least two setsof process instances and then creating analysis that shows differencesand/or similarities between these sets. These sets may be created forexample by utilizing one process instance attribute so that the value ofsuch attribute will determine directly or indirectly the set to whichthe process instance belongs. On the other hand, the set may be createdby filtering process instances separately for 1st set and 2nd set sothat the sets may contain same process instances. Or it can be acomparison with a large set of process instances and a smaller sub setof process instances to that particular set, for example the processinstances for which the duration was longer than average. Yet anotherway to build a benchmark set is to include different events to the sets.In this way the process instances may be same in all benchmark sets butone set may contain events of for example a particular event type andanother set does not include those events. As part of this feature, theanalysis can be done for example by showing the difference in aparticular duration, amount, deviation, cost, existence or probabilityof a certain event, event type, process instance, event attribute, orprocess instance attribute. A flowchart can for example show as aduration the difference of the durations. A tree-like graph can show theprocess variation tree separately for all benchmarked sets. A tabularanalysis report may include the benchmark set name as a one column andthen attributes related to that tabular analysis report as separatecolumns so that there are separate values for benchmarked analysis setsin separate rows in the tabular report.

A still further optional feature of the invention comprises a methodwherein the system is used to predict probabilities for certain eventsfor process instances. For example a sales process analysis system maycalculate the probability of a sales case of being Won or Lost based onthe prior events loaded into the system for other sales cases and thisparticular sales case. The system could also assist for example thesalesperson in selecting a most beneficial activity, ie. what activityand for what process instance he or she should do next to maximizehis/her sales pipeline value and probability of sales cases getting Won.This could also be done automatically so that the system will perform anactivity when a certain result appears in the analysis results.

A still further optional feature of the invention comprises a methodwherein the analysis report contains calculation of influencers for aprocess instances. This could be for example calculation of whatinformation in the whole source data set seems to increase/decrease thelikelihood of individual process instance to have a certaincharacteristic—or a predicting for getting the characteristic in afuture. This could be done for predicting purposes and the timeframecould also be a particular period like “December/2012”.

A still further optional feature of the invention comprises a methodwherein there exists a user client program that sends a command relatedto any of the individual information visible in any of the generatedanalysis so that an action is performed. For instance, the action maycomprise one or more of the following:

-   -   Open a new analysis—this creates and opens a new analysis for        the selected information    -   Open new analysis with exclude—this modifies the source data set        for the analysis so that only certain events and process        instances correspond to the object in the analysis result are        included in the next analysis    -   Send an alert message to another server, web service or a user,        for example as an e-mail, SMS message or web service message    -   Write entry to a log file or database    -   Initiate a workflow, for example with parameters configured        based on the analysis results    -   Update a report in a streaming broadcast system

A collection of event instance data sets can be comprised into models.Every model can have its own attributes such as model name, accesscontrol definitions, view definitions, publicity settings etc.

Access control and user rights management can be implemented for exampleusing role-based access control mechanism.

The model views could be implemented as filters applied for all theevent instance data sets in the model. The result of this filteringprocess which is a new set of event instance data sets, can be used asevent instance data set source for creating analyses.

Event instance data set attributes could be saved for example into arelational database so that there are two tables: attribute type andattribute value. The type table defines unique identifiers forattributes, attribute names and their types which could be used todetermine whether the attribute is used as an attribute bound to events(which are referred as event attributes later in this document), anattribute bound to process instances (which are referred as processattributes later in this document) or even an attribute bound to models(which are referred to as model attributes later in this document). Theattribute value table just defines, in addition to the actual attributevalue (that could be stored for example as a kind of a variant-object),the attribute type identifier and the identifier of the object it isbound to. Attributes may also be added for discovered event types andanalysis results.

The interface between analysis client and server could be implementedfor example in such a fashion that it is possible to configure server toact as a client for another server and to thus delegate all or some ofthe analysis requests further to be processed by the next server, whichin turn can do the same delegation if desired. It may even be possibleto run the server and client as programs or processes in the samephysical hardware. Such a configurable mechanism could be implementedusing a common interface which is implemented by both the “analysisserver engine” and the “analysis server client”. The client itself justcommunicates with the server through this common interface.

It could also be possible to implement a mechanism for event, processand model attributes to be re-used by adding a special relation betweenmodels. This relation could be used in such a fashion that if such arelation exists between two models, the attributes of objects in onemodel are also visible for objects in another model, provided that thereexists similar object in both models. Object similarity in this case isdetermined for example by matching process instance names (which is readfrom some process attribute) or event instance names (which is read fromsome event attribute) with each other.

There could also be a mechanism for logging all the server requeststogether with some or all of the given request parameters. This loggingmechanism could be used for example for tracking model changes andanalysing service behaviour in all kinds of situations. This log couldalso be used as an event source for process analysis for analysingdifferent aspects of the server behaviour under different circumstances.This mechanism could be further extended to also track progress of longlasting requests by adding progress counter number fields to log entriesthat are updated during the request processing. The current progresscounter status can be queried asynchronously by the client wheneverdesired. Also this mechanism could be further extended to allowcancelling request processing by allowing client to set specific markerto request log entries, which are periodically checked by the requestprocessor. If the request processor sees that such a marker is set, therequest processing will be aborted immediately.

In order to minimize the resources required for storing events withcached information, some kind of mechanism could be added toperiodically remove all cache data fulfilling certain criteria. Thesecriteria could include for example that all event caches created for aview of a model which is not published for anybody else but the one whocreated the view should be deleted if they aren't used within specificamount of time after their previous usage.

Event cache can be implemented for example as separate relationaldatabase table for every view of a model, thus making it very easy andefficient to create new event caches and delete unneeded ones.

The server could be implemented as a web service which enables veryflexible deployment methods. For example it allows deploying it intosome cloud server farm such as Microsoft Azure or into a single serverinside corporate domain. Web service could be implemented in such afashion that it allows importing event instance data sets easily forexample using some automatic integration tools that periodicallydownloads event instance data sets from some external system such as CRMsystem and imports them to the server via its web service interface.

The analysis results returned from server could be implemented in such afashion that they always include all the information required torecreate the same analysis later and possibly continue the iterativeanalysis process from the spot the analysis was made. In addition tothis, all the analysis results could include some common set ofinformation such as model name, model's last modified date and the nameof the user who last modified the model.

To speed up analysis generation process, the server could allow clientsto define some kind of sampling criteria to select only some subset ofall the event instance data sets in some view of some model. Samplingcriteria could be for example “use only 50% of all the event instances”,which could cause server to do the analysis only for some 50% of all theevent instances in the view of the model.

A still further optional feature of the invention comprises a methodwherein as the analysis is presented to one or more clients, thoseclients provide additional information such as a comment related to theanalysis or any part of it, vote on a predefined scale about for examplethe importance or severity of the finding or agree/disagree about apreviously provided comment or statement. Such information about theusage could then be sent and stored in the server and the informationcould be used for making further analysis. Also the information aboutusage, for example how many times a particular analysis has been viewedcan be used in prioritizing the importance of analysis compared to eachother.

BRIEF DESCRIPTION OF THE DRAWINGS

In the following the invention will be described in greater detail bymeans of specific embodiments with reference to the attached drawings,in which:

FIG. 1 shows an overall view of an environment wherein the invention canbe utilized;

FIG. 2 shows how the computer-implemented process analysis system can becoupled to the ERP system;

FIG. 3 schematically shows an exemplary block diagram for the databaseserver of the process analysis system;

FIG. 4, which contains sub-FIGS. 4(A) through 4(E), shows examples ofSQL scripts for initializing and populating various tables in thedatabase of the process analysis system;

FIG. 5, which contains sub-FIGS. 5(A) and 5(B), shows how a cached andordered relational database table, which has been populated by a scriptas shown in FIG. 4, can be used to speed up and simplify processing ofqueries inside the database engine itself;

FIG. 6, which contains sub-FIGS. 6(A) through 6( e), shows examples ofprocess charts of varying detail level, wherein the detail level isbased on filtering out rarely-occurring process variations;

FIG. 7, which contains sub-FIGS. 7(A) through 7(C) shows howuser-settable filtering may be utilized for detecting problem spots inthe real-world process;

FIG. 8, which contains sub-FIGS. 8(A) and 8(B) illustrates conceptsrelating to identification of processes;

FIG. 9, which contains sub-FIGS. 9(A) through 9(E), illustrates theconcept and use of process variations; and

FIG. 10 illustrates a causal model, which is a key feature in the fifthfeature set and an optional feature in other feature sets.

DETAILED DESCRIPTION OF THE SPECIFIC EMBODIMENTS

FIG. 2 shows how the computer-implemented process analysis 1-300 systemcan be coupled to the information management system 1-200. In thefollowing, the term “ERP” will be used as in illustrative butnon-restrictive example of information management systems that supportthe real-world process 1-100. Reference sign PA-C generally denotesclients, which may be workstations or other processes accessing theprocess analysis system 1-300.

There are basically two ways to transfer data relating to the logisticprocesses 1-100 from ERP system 1-200 to the process analysis system1-300. For instance, it is possible to provide the ERP system (or otherinformation management system) with one or more data-mining plug-ins,one of which is denoted by reference numeral 2-210. The data-miningplug-ins are configured to find data relevant for process analysiswithin the and transfer such data to the process analysis system 1-300.Alternatively or additionally, it is possible to provide the clientswith event detection plug-ins 2-220 configured to detect events ofinterest in the ERP system and replicate the events into the processanalysis system 1-300. In some exemplary implementations, the eventdetection plug-ins may comprise RFID readers installed at checkoutcounters, warehouse exits or the like.

FIG. 2 further illustrates the concepts of analysis result set andpresentation of analysis in an interactive manner. Reference numeral2-300 denotes the act of forming an analysis result set in the processanalysis server system PA-S. Arrow 2-310 illustrates sending the processanalysis result set from the process analysis server PA-S to a specificone of the process analysis client computers PA-C. The process analysisserver PA-S produces a specific analysis result set 2-310 for eachclient computer PA-C. Arrow 2-320 depicts presentation of the analysisby the client computer PA-C. Arrow 2-330 relates to the second featureset of the invention, wherein the client computer PA-C, in response toreceiving an input related to the analysis result, sends a request tothe process analysis server PA-S. As a result of the request 2-330, theprocess analysis server PA-S forms filtered event instance data byexcluding event instance data sets from the analysis based on therequest 2-330. Based on the filtered event instance data the processanalysis server PA-S then repeats the above steps of forming theanalysis result 2-300, sending the analysis result 2-310, while theclient computer PA-C re-presents the analysis 2-320 based on thefiltered event instance data.

FIG. 3 schematically shows an exemplary block diagram for the databaseserver SS of the process analysis system. FIG. 3 schematically shows ablock diagram of a database server system SS. The two major functionalblocks of the database server system SS are a database server computer3-100 and a disk system 3-190. The server computer 3-100 comprises oneor more central processing units CP1 . . . CPn, generally denoted byreference numeral 3-110. Embodiments comprising multiple processingunits 3-110 are preferably provided with a load balancing unit 3-115that balances processing load among the multiple processing units 3-110.The multiple processing units 3-110 may be implemented as separateprocessor components or as physical processor cores or virtualprocessors within a single component case. The server computer 3-100further comprises a network interface 3-120 for communicating withvarious data networks, which are generally denoted by reference sign DN.The data networks DN may include local-area networks, such as anEthernet network, and/or wide-area networks, such as the internet. Theserver system SS serves one or more process analysis work stationsPA-WS, via the data networks DN.

The server computer 3-100 of the present embodiment also comprises auser interface 3-125. Depending on implementation, the user interface3-125 may comprise local input-output circuitry for a local userinterface, such as a keyboard, mouse and display (not shown).Alternatively or additionally, management of the server computer 3-100may be implemented remotely, by utilizing the network interface 3-120and a terminal similar to the process analysis work stations PA-WS. Thenature of the user interface depends on which kind of computer is usedto implement the server computer 3-100. If the server computer 3-100 isa dedicated computer, it may not need a local user interface, and theserver computer 3-100 may be managed remotely, such as from a webbrowser over the internet, for example. Such remote management may beaccomplished via the same network interface 3-120 that the servercomputer utilizes for traffic between itself and the client terminals.

The server computer 3-100 also comprises memory 3-150 for storingprogram instructions, operating parameters and variables. Referencenumeral 3-160 denotes a program suite for the server computer 3-100.

The server computer 3-100 also comprises circuitry for various clocks,interrupts and the like, and these are generally depicted by referencenumeral 3-130. The server computer 3-100 further comprises a diskinterface to the disk system 3-190. The various elements 3-110 through3-150 intercommunicate via a bus 3-105, which carries address signals,data signals and control signals, as is well known to those skilled inthe art.

The inventive method may be implemented in the server system SS asfollows. The program suite 3-160 comprises program code instructions forinstructing the set of processors 3-110 to execute the functions of theinventive method, wherein the functions include performing the processanalysis functions according to the invention and/or its embodiments.Specifically, the functions of the inventive method include the actsdefined in claim 1.

FIG. 4, which contains sub-FIGS. 4(A) through 4(E), shows examples ofSQL scripts for initializing various tables in the database of theprocess analysis system 1-300 and for populating some of those tableswith data originating from the ERP system 1-200.

Reference numeral 4-100 denotes an SQL table definition for a processanalysis event table (“PA_EVENT”). In this example, the event table ofthe analysis system 1-300 will be populated with events (event instancedata sets) from the ERP system 1-200. In other words, the event datawill be copied from the ERP system 1-200 to the analysis system 1-300.It is worth noting that in the present example, the definition 4-100 forthe PA event table does not contain any explicit references to anyparticular processes. Instead of explicit process identifiers that mighttie specific event instances to predefined processes, the presentembodiment supports dynamic definition and redefinition of processes inreal time. In other words, “processes” (for the purposes of processanalysis) can be defined on-the-fly, arbitrarily.

The SQL table definition 4-100 contains a few data items (columns orfields) worth mentioning. Reference numeral 4-110 denotes a processinstance identifier, reference numeral 4-115 denotes an example of anevent type identifier, reference numeral 4-120 denotes an exemplarytimestamp and reference numeral 4-125 denotes an exemplary cost itemassociated with an event. The cost item, which can be used to indicateresource consumption (monetary or otherwise), may be used in analysisand optimization of processes.

Reference numeral 4-200 denotes an SQL script for creating a temporarycache table that is to contain event order numbers within processinstances. Use of both ascending and descending ordering (“EVT_ORDER”,“EVT_ORDER_DESCENDING”) expedites tracking predecessor-successorrelations.

Reference numeral 4-250 denotes an exemplary SQL script for populatingthe temporary cache table that is to contain event data ordered both inascending and descending order. Reference numerals 4-261 and 4-262denote script lines that respectively order the event data in ascendingand descending order. Reference numeral 4-264 denotes the table fromwhich the temporary cache table is populated. For the first executionthe table 4-264 references table 4-100, and for the subsequentexecutions it references table 4-200.

Reference numeral 4-300 denotes an SQL table definition for a cachedevent table (“PA_CACHE_EVENT”) for the purposes of process analysis. Thecached event table is distinct from the temporary cache table populatedby the script 4-250). The event cache-table is automatically createdbased on selected view settings. A view setting, which may include amodel identifier, may define which event types (=activities), cases andvariations are shown and which are hidden from view. The idea here is tocreate an ordered table of transitions between event instances, whereinthe ordering is based on comparison of start times between the eventinstances. It is thus the comparison of the start time between the eventinstances that serves as the basis for determining predecessor-successorrelations. Reference numeral 4-310 denotes a data item (column) whichcan be used as the order information. Note the columns starting with“evt” (for event) and “evt_next” (for next event). Reference numeral4-320 denotes a column for defining a successor event type, whilereference numeral 4-330 denotes a column for defining a successor starttime. It is worth noting here that in principle the duration of an eventinstance is of primary importance in process analysis but it is notnecessary, or even beneficial, to store the duration of an event as asingle quantity for each event. This is because the interactivefiltering according to the second feature set may exclude information onvery detailed level, which is why some events may not be displayed.Accordingly, it is advantageous to be able to dynamically compute theduration of an event as the difference between the successor start time4-330 and the event start time 4-120 (shown in FIG. 4(A)).

Reference numeral 4-350 denotes an exemplary SQL script for populatingthe event cache table, ie, the one initialized by the script 4-300. Thescript 4-350 uses a self-join to collect both ends of all eventtransitions into a single row of the event cache table. Referencenumeral 4-370 begins a SELECT statement used for the majority of eventswherein a next event exists. The SELECT statement contains a JOIN verb4-372 joining table #PA_EVENT_WITH_ORDER with itself, using alias namesE01 and E02. Reference numeral 4-374 denotes a criterion based on theevent orders that establishes a predecessor-successor (previousevent-next event) relation. Reference numeral 4-376 denotes a criterionthat the events must share a common case (process instance) identifier.

FIG. 5(A) shows an exemplary user interface 5-100 for real-timeinteractive analysis of “processes” whose event data has beentransferred from the ERP system 1-200 to the analysis system 1-300. Inparticular, the event data has been imported to thePA_CACHE_EVENT_ORDERED table that was populated by the script 4-350shown in FIG. 4.

The quotation marks around “processes” signifies the fact that the ERPsystem 1-200 does not contain any explicit process indicators, and aconsiderable amount of processing has to be performed in order to arriveat definitions for “process” that permit meaningful analysis ofprocesses. The reader is reminded of the fact that in addition to thearguably obvious question of how to find out the cognitive informationfor defining meaningful “processes”, there remains the technical problemof speeding up process analysis to a level wherein interactive analysisof arbitrarily defined processes is possible. It may be tempting tobelieve that caching of frequently-used data is the omnipotent solutionto speed-related problems. On the other hand, if everything is cached,then any changes in the data of the ERP system 1-200 obsoletes all datain the analysis system 1-300, and updating of the cached data willemerge as the next bottleneck. The technical problem to be solved isthus: precisely, what should be cached and how, in order to enableinteractive analysis of arbitrarily defined processes in a system wherethe original data (the data of the ERP system) does not contain usefuldefinitions for processes.

FIG. 5 contains sub-FIGS. 5(A) and 5(B), both of which are screenshotsfrom Microsoft® SQL Server Management Studio, which can be used toprofile and test SQL queries. The queries shown in FIGS. 5(A) and 5(B)yield the same result although they use different approaches, namelycached and non-cached tables for event data, respectively. Referencenumeral 5-120 denotes an exemplary query that may be entered via theuser interface 5-100. The query 5-120 retrieves event data from thePA_CACHE_EVENT_ORDERED table that was populated by the script 4-350.Specifically, the query 5-120 selects the second event of a case(process instance) identified by case identifier 5.

Reference numeral 5-130 denotes an estimated query plan created forevaluating the query 5-120 using SQL Server 2008 database. As can beseen from FIG. 5, processing the query involves two quite simpleprocessing steps. For the sake of comparison, FIG. 5(B) shows analternative scenario in which a query 5-170 basically similar to thequery 5-120, is applied to a non-cached table. Reference numeral 5-180denotes an estimated query plan created for evaluating the query 5-180using SQL Server 2008 database. As can be seen from FIG. 6, processingthe query involves nine separate processing steps.

For the sake of comparison, FIG. 5(B) shows an alternative scenario inwhich a query 5-170 basically similar to the query 5-120, is applied toa non-cached table. Reference numeral 5-180 denotes an estimated queryplan created for evaluating the query 5-180 using SQL Server 2008database. As can be seen from FIG. 6, processing the query involves nineseparate processing steps.

What may not be directly evident from FIG. 5(B), but should be apparentto those skilled in the art of SQL processing, is that many of the stepsinvolved in the processing of the query 5-180 require so much databaseengine resources when large amounts of data are being evaluated that theoperation is generally unsuitable for interactive process analysis. Thedecrease in the number of processing steps and the correspondingincrease in processing speed is made possible by an optimized selectionof which data tables are cached and which are not.

Specifically, the pre-calculated and cached data should include ordernumbers ascending and/or descending order for individual events (eventinstances) within each process instance, as was described in connectionwith FIG. 4(E).

Filtering with Parameters

FIG. 6, which contains sub-FIGS. 6(A) through 6( e), shows examples ofprocess charts of varying detail level, wherein the detail level isbased on filtering out rarely-occurring process variations. FIG. 6(A)shows a self-explanatory chart 6-100 for describing process variations,which is presented by way of introduction. At this point of this patentspecification, the chart 6-100 is presented as a desired goal, and adetailed description of the implementation will be provided later. Thechart 6-100 provides an intuitive view into various processes in whichorders from customers are processed into cash flow. The chart 6-100intuitively shows that most processes follow the path defined by events6-111 through 6-115. This is evident from the varying thickness of theflow arrows that connect the various events. The chart 6-100 shows eventclasses but not event instances, such that, say, “outbound delivery” isan event class, whereas an “outbound delivery for a specific order at acertain date/time, etc., is an event instance.

A specific reason as to why the chart 6-100 shown in FIG. 6(A) is sointuitive can be understood by comparing it with chart 6-200 shown inFIG. 6(B). The chart 6-200 shown in FIG. 6(B) shows every differentprocess variation and every flow, wherein a process variation is definedby the chain of event types or event classes traversed by the process.In other words, if a number of process instances (each with specificorder numbers, delivery dates, etc.) traverse through exactly the sameevent types in the same order, those process instances are said to forma process variation. The ERP system for a big company that deliversproducts to customers may store events in respect of millions of processinstances. Each chain of event instance from order (or bid, for example)to invoicing may be considered a process. Any process instances thattraverse through the same event instances at exactly the same orderbelong in the same process variation. The concept of process variationsmakes it possible to show, in a typical case, all the possible processvariations on a really large computer screen or printout. Unfortunatelythe nature of FIG. 6(B) makes it impossible to comply with the rules forminimum character height in drawings. But the idea is not to show thelegend for each event class. Instead the idea is to demonstrate theproblem that showing too much detail obscures information relevant to,say, finding processing bottlenecks.

In some computer-aided design or analysis applications, zooming in andout are frequently used to find order in chaos. FIG. 6(C) shows azoomed-in view 6-300 of a portion of the chart 6-200 shown in FIG. 6(B).Now it is easy to see why zooming in and out does not necessarily bringorder into chaos: virtually all the connections (predecessor-successorrelations) in the chart 6-300 begin and/or end outside the chart 6-300.

Let us now return to the question of why the chart 6-100 shown in FIG.6(A) was so clear and intuitive. Note the two arrows leading out ofevent type 6-111. The wide arrow leading to event type 6-112, togetherwith its associated legend, shows that 86% of the analyzed processinstances follow the path from event type 6-111 (“standard order”) toevent type 6-112 (“outbound delivery”), whereas the other, narrowerarrow, which leads to event type 6-116 (“purchase order”) holds for 8%of the process instances. Now, 86%+8% do not add up to 100%, and 6% ofthe process instances have been hidden from view. Within the context ofthe present invention, this optional feature is called filtering. Bymeans of filtering, it is possible to hide process variations thatdescribe less than a certain threshold percentage of processes.

By way of example, FIGS. 6(D) and 6(E) show two more filtered sets ofprocess variations, wherein the chart 6-400 shown in FIG. 6(D) hidesprocess variations that account for less than 10% of the processinstances, while the chart 6-500 shown in FIG. 6(E) hides processvariations that account for less than 2% of the process instances. It isself-evident that the percentages (the 10% and 2%) are arbitrary valuesfor the purposes of illustrating this embodiment of the invention. Thepercentage values for filtering are preferably user-settable.

FIG. 7, which contains sub-FIGS. 7(A) through 7(C) shows howuser-settable filtering may be utilized for detecting problem spots inthe real-world process. As indicated by the scenario shown in FIGS. 6(B)and 6(C), mere zooming in to a specific spot in a process chart fails toidentify relevant connections between events.

In FIG. 7(A), reference numeral 7-100 denotes a section of a processchart which is basically similar to the process charts shown in FIGS.6(A) through 6(B). Reference numeral 7-150 denotes a portion of theprocess chart 7-100, which the user wishes to see in a zoomed-in view.That portion 7-150 is shown magnified in FIG. 7(B) and it contains fourevents 7-151 through 7-154. In the present scenario, the user wishes toanalyse the reason for the surprisingly high percentage of backwardprocess flows identified by the user-settable outlining frame 7-200. Theuser instructs the analysis system to show more detail in the portion7-150, and the resulting view 7-300 is shown in FIG. 7(C). As shown inFIG. 7(C), the added detail level reveals three more nodes, denoted byreference numerals 7-155 through 7-157, that were hidden in the previousviews. In the scenario shown in FIG. 7(C), the user of the analysissystem may be able to see that the reason for the backward flows is thefact that a significant number of customers change their orders afterprocessing of those orders has begun.

Understanding the specific reason for the backward flows is arguably inthe realm of cognitive information processing and thus may be irrelevantfor the present invention. Instead the present invention relates totechnical questions such as how to identify and present processes, orhow to provide a user interface that permits varying the detail levellocally, as opposed to globally, and how to make an SQL server performall the processing with sufficient speed for interactive processanalysis.

FIG. 8(A) illustrates concepts relating to identification of processes.Reference numeral 8-100 denotes a plurality of event instance data sets.The information management system 1-200, such as an ERP system,typically stores a huge number of such event instance data sets. Eachevent instance data set 8-100 contains data that relates to an eventinstance in the logistic process 1-100. For instance, the event instancedata sets 8-100 may model or control real-world event instances in thelogistic process 1-100. In the exemplary case shown in FIG. 8(A), theevent instance data sets 8-100 describe corresponding real-world eventinstances by specifying what happened (or is to happen), when did/willthe event instance start or end, what is the location, state or phase ofa resource at the origin and destination of the event instance, etc.Such description data for the real-world event instances arecollectively denoted by reference numeral 8-110.

Reference numeral 8-150 denotes a process identified by a processidentifier. As briefly stated in the introductory section of this patentspecification, it is well known to use computer-implemented analysistools to find opportunities for improving the efficiency of real-worldprocesses. As regards the present invention, one of the problems relatesto the fact that the computer-implemented analysis system may not bedirectly connectable to the real-world process 1-100, and the ERP system1-200 or other information management system that supports thereal-world process 1-100 may not explicitly identify any real-world“processes”. In a typical case, the ERP system 1-200 may storeidentifiers for orders placed by customers, whereas the company thatfulfils the orders might be interested in improving the efficiency of amanufacturing apparatus or delivery vehicle. It is thus evident that theorder number assigned for the customer's order is generally insufficientfor identifying “processes” in a manner which could yield meaningfuldata for improving the efficiency of physical processes. Accordingly,the analysis system 1-300 is preferably able to form explicit processidentifiers based on information contained in the event instance datasets 8-100. Reference numeral 8-120 denotes an exemplary processidentifier, which the process analysis system 1-300 forms on the basisof the description data 8-100. All event instance data sets 8-100 havinga process identifier thus belong in the same process, although thedefinition of “process” may be altered dynamically, depending on whichkinds of processes are being analysed.

In any real-world processes for which it makes sense to establish acomputer-implemented analysis system, the number of individual eventsand processes is huge. It is therefore desirable to apply some kind of ageneralization scheme to find patterns which are typical for specifickinds of events and processes. According to an embodiment of theinvention, the process analysis system is capable of generalizing eventinstances 8-100 to event types 8-200 by utilizing only a subset 8-210 ofthe description data 8-110. In an illustrative example, the subset 8-210of description data that identifies an event type (as opposed to eventinstance) only contains a descriptor of what happened in the event. Forexample, the event descriptor 8-210 may indicate that the event type isa delivery of an order but ignore the details of the order.

In some embodiments, identification of processes may utilize externaldata 8-300, which is data other than event data. A data item in theexternal data 8-300 may describe other items of external data, eventinstances 8-100 or processes 8-150.

In some implementations the server may create and store a view 8-400 foreach filtered set of the event instance data sets. The stored viewincludes at least information indicating which event instance data setsare included or excluded from the analysis. Each view may be used tocreate specific analysis result sets and specific presentations ofanalysis. In short, the view describes what information elements toinclude (or exclude), based on which criteria and, optionally, variousanalysis parameters which may specify how to combine or otherwise useinformation from the event instance data sets. For instance, such use ofinformation may involve computing a weighted cost function, wherein costindicates consumption of one or more resources, monetary or otherwise.

In a further embodiment of the invention, the process analysis system iscapable of detecting and analysing process variations. As used herein, aprocess variation is defined as an ordered series of events (as classes,not as instances). In other words, all processes having exactly the sameevents in exactly the same order belong in the same process variation8-250. Process variations will be discussed in detail in connection withFIG. 9.

FIG. 8(B) illustrates flow instance and flow instance type. As shown inFIG. 8(B), a flow instance 8-500 means a transition between two eventinstances, denoted herein by reference numbers 8-100 ₁ and 8-100 ₂.Correspondingly, a flow instance type 8-550 means a transition betweentwo event types 8-200 ₁ and 8-200 ₂.

The relation between the information elements shown in FIGS. 2 and8(A)-(B) is as follows. The process analysis database PA-DB coupled tothe server PA-S contains all the information shown in FIG. 8(A)-(B).Based on some initial request the server PA-S forms an analysis resultset (shown as step 2-300) and sends it to the client PA-C (step 2-310).The client presents an analysis based on the analysis result set (step2-320) and based in an input relating to the analysis, sends a requestto the server PA-S (step 3-330). Based on the request, the server PA-Sthen forms filtered versions of the information shown in FIG. 8(A)-(B),including the event instance data sets, sends a revised analysis resultset to the client, and the client presents a revised analysis.

FIG. 9, which contains sub-FIGS. 9(A) through 9(E), illustrates theconcept and use of process variations. As briefly stated in connectionwith FIG. 8(A), a process variation is defined as an ordered series ofevents (as classes or event types, not as event instances). In otherwords, all processes having exactly the same events in exactly the sameorder belong in the same process variation. Reference numeral 9-100denotes an exemplary SQL table definition for a variation-tablecontaining all the process variations transferred from the informationmanagement system 1-200 to the process analysis system 1-300. Referencenumeral 9-200 denotes an exemplary SQL table definition for a processinstance table. In the drawings, “case” is used as a shorthand notationfor process instance. The process instance table according to thedefinition 9-200 is referenced by the event-table and has a foreign keyto the variation table.

FIG. 9(C) shows a flow chart for an algorithm to be processed whenevernew events are added to the database. Firstly, all unique caseidentifiers (process instance identifiers) are collected for allimported events (9-302). Next, the algorithm queries a list of eventtype identifiers visited by the process instance, and the eventsreturned by the query are sorted by the their time stamps (9-304). Thenthe process analysis system creates an easily searchable representationof this event type (9-306). The reason for creating an easily searchablerepresentation of the event types is that the representation of theevent types should be searchable by the SQL server of the processanalysis system 1-300 (item PA-S in FIG. 2). An illustrative example ofan easily searchable representation of the event types will be shown inconnection with FIG. 9(D). Next, the process analysis system checks if aprocess variation with an identical event type paths already exists(9-308). If not, the algorithm proceeds to creating a new variation,which is selected as the variation of the process instance (9-310). Onthe other hand, if a process variation with an identical event typepaths already exists, it will be selected as the variation of theprocess instance (9-312). The preceding steps starting from 9-304 arerepeated until all selected process instances have been processed.

FIG. 9(D) shows an illustrative but non-restrictive example of an easilysearchable representation of the event types. The representation of theevent types is generally called a “path” and denoted by referencenumeral 9-400. The path 9-400 comprises a path start 9-402, an eventtype identifier 9-404 for the event type of each event in the process,and a path end 9-408. Between any two event type identifiers, there is aseparator 9-406. The symbols for the path start 9-402, separator 9-406,and path end 9-408 must not be used in the event type identifiers 9-404.A benefit of a textual representation of the path (chain of event typeidentifiers for the event instances of a process instance) is that thepaths are easily processed by SQL servers. It is self-evident that aslong as these criteria are met, there is a virtually endless variety ofsuitable textual symbols. The path 9-400 is an exemplary representationof process variations, although other representations may be used aswell.

FIG. 9(E) shows an illustrative example of a query which can beprocessed by an SQL server. Specifically, the SQL script denoted byreference numeral 9-500 finds the most common process variation (chainof event type identifiers for a process instance). As stated earlier, abenefit of the above-described features is that the queries can beprocessed by the SQL server of the process analysis system 1-300 (itemPA-S in FIG. 2), which is why the communication links between theclients and the SQL server will not constitute a bottleneck. If theabove-described features are not used, implementation of thecorresponding queries will much more complicated and it is not easy tosee how processing could be performed entirely in the SQL server of theprocess analysis system.

FIG. 10 illustrates the concept of the causal model 10-100, which is akey feature in the fifth feature set and an optional feature in otherfeature sets. Arrows 10-102 and 10-104 illustrate the fact that thecausal model 10-100 includes or is based on the event data 8-200 andexternal data 8-300 respectively. Arrow 10-106 illustrates how one eventtype 8-200 ₁ influences another event type 8-200 ₂, and such mutualinfluence relations between event types is stored in and utilized by thecausal model 1-100. Arrow 10-108 illustrates the fact that in someimplementations the causal model 10-100 is able to predict probabilitiesfor alternative outcomes in a sequence of events generally denoted byreference numeral 10-200. In the example shown, event En, for whichthere are m possible outcomes, has a sequence of k prior events (eventsEn-k through En-1). The m possible outcomes for event En are denoted En₁through En_(m). Based on the causal model 10-100, including thestatistical frequencies of the event instances, the server PA-S is ableto compute probabilities P(En₁) through P(En_(m)) for the respectiveoutcomes En₁ through En_(m) for the next event En, assuming that theevent En is preceded by the sequence of k prior events En-k throughEn-1. In some implementations the creation of causal model includesusage of advanced data mining algorithms or neural networks.

It will be apparent to a person skilled in the art that, as thetechnology advances, the inventive concept can be implemented in variousways. The invention and its embodiments are not limited to the examplesdescribed above but may vary within the scope of the claims.

We claim:
 1. A method for analyzing information derived from event databy a computer-implemented analysis system, which comprises a server andone or more clients, wherein the event data describes a real-worldprocess the execution of which is supported by at least one informationmanagement system but the real-world process is not directly connectablewith the computer-implemented analysis system, the method comprising thefollowing acts performed by the server: importing event instance datacomprising a plurality of event instance data sets from the at least oneinformation management system, wherein each event instance data setcomprises one or more attributes describing an event instance in thereal-world process; determining for each imported event instance dataset a corresponding process instance based on at least the attributes ofthe imported event instance data set; determining at least one eventorder attribute for each imported event instance data set based on otherevent instance data sets corresponding to the same process instance;forming an analysis result set based on at least the event instance datasets and at least one event order attribute; sending the analysis resultset to one or more clients; the method further comprising: the one ormore clients presenting an analysis utilizing the analysis result set,and in response to receiving an input that is related to the analysis,sending a request to the server; the server forming filtered eventinstance data by excluding event instance data sets from the analysisbased on the input received from the one or more clients; repeating atleast the above steps of forming the analysis result, sending theanalysis result and presenting a revised analysis based on at least thefiltered event instance data.
 2. The method according to claim 1,further comprising: the server creating and storing a view for eachfiltered set of the event instance data sets, wherein the stored viewincludes at least information indicating which event instance data setsare included or excluded from the analysis.
 3. The method according toclaim 2, wherein the stored view includes analysis parameters usable forrecreating a specific analysis result.
 4. The method according to claim2, wherein one or more additional attributes are stored for stored view,the one or more additional attributes indicating one or more of thefollowing: access rights to one or more users, descriptive informationfor the view, results relating to user voting for rating of the view anduser comments relating to the view.
 5. The method according to claim 4,further comprising: the server storing a criterion for at least one viewwherein, if the criterion is triggered by importing at least one newevent instance data set, the server initiates an action specified forthe criterion.
 6. A computer-implemented analysis system comprising aserver for supporting one or more clients, the server comprising: atleast one processing unit memory for storing applications and datawherein the memory comprises program code instructions for instructingthe at least one processing unit to carry out the following steps:importing event instance data comprising a plurality of event instancedata sets from the at least one information management system, whereineach event instance data set comprises one or more attributes describingan event instance in the real-world process; determining for eachimported event instance data set a corresponding process instance basedon at least the attributes of the imported event instance data set;determining at least one event order attribute for each imported eventinstance data set based on other event instance data sets correspondingto the same process instance; forming an analysis result set based on atleast the event instance data sets and at least one event orderattribute; and sending the analysis result set to one or more clients.7. The computer-implemented analysis system according to claim 2,wherein the server comprises multiple processing units and aload-balancing unit for distributing processing load among the multipleprocessing units.
 8. A computer-readable memory comprising program codeinstructions for a server of a process analysis system that alsocomprises one or more clients, wherein the program code instructions,when executed by the server, cause the server to perform the steps of:importing event instance data comprising a plurality of event instancedata sets from the at least one information management system, whereineach event instance data set comprises one or more attributes describingan event instance in the real-world process; determining for eachimported event instance data set a corresponding process instance basedon at least the attributes of the imported event instance data set;determining at least one event order attribute for each imported eventinstance data set based on other event instance data sets correspondingto the same process instance; forming an analysis result set based on atleast the event instance data sets and at least one event orderattribute; and sending the analysis result set to one or more clients.